Supervised Methods for Domain Classification of Tamil Documents

نویسندگان

  • Reshma U
  • Barathi Ganesh
چکیده

The Era of digitization induces the need of domainclassification in both the on-line and off-line applications. The necessity of automatic text classification arises for utilizing it in diverse fields. Hence various methodologies like Machine Learningalgorithms were proposed to do the same. Here automatic document classification of Tamil documents have been proposed by considering the exponential growth of Tamil text documents in the form of unstructured data available as News, Encyclopedias, E-books, E-Governance, Social Media and much more. Max-Ent, CRF and SVM algorithms are used here to achieve more than 90 percentage average accuracy in both the sentence and document level classification of Tamil text documents. In this work Dinakarannewspaper dataset from EMILLE/CIIL Corpus has been utilized to experiment the ability of Machine Learning algorithms in Tamil domain classification.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Fisher Discriminant Analysis (FDA), a supervised feature reduction method in seismic object detection

Automatic processes on seismic data using pattern recognition is one of the interesting fields in geophysical data interpretation. One part is the seismic object detection using different supervised classification methods that finally has an output as a probability cube. Object detection process starts with generating a pickset of two classes labeled as object and non-object and then selecting ...

متن کامل

Semi-Supervised Matrix Completion for Cross-Lingual Text Classification

Cross-lingual text classification is the task of assigning labels to observed documents in a label-scarce target language domain by using a prediction model trained with labeled documents from a label-rich source language domain. Cross-lingual text classification is popularly studied in natural language processing area to reduce the expensive manual annotation effort required in the target lang...

متن کامل

Web based classification of Tamil documents using ABPA

Automatic text classification based on vector space model (VSM), artificial neural networks (ANN), Knearest neighbor (KNN), N aives Bayes (NB) and support vector machine (SVM) have been applied on English language documents, and gained popularity among text mining and information retrieval (IR) researchers. This paper proposes the application of ANN for the classification of Tamil language docu...

متن کامل

Classification of text documents supervised by domain ontologies

The research objective is to establish an approach for supporting the classification of text documents referring to a specified domain. The focus is on the preliminary topic assignment to the documents used for training the model. The method implements domain ontology as background knowledge. The idea consists in extracting the preliminary topics for training the classifier by means of unsuperv...

متن کامل

Survey of robust artificial intelligence classifier proper for various digital data

Artificial intelligence or machine intelligence should be considered as the vast domain of junction of many knowledge, sciences and old and new technics. Today, classification of documents is adopted extensively in information recovery for organizing documents. In the method of document supervised classification some correct information about documents that previously have been classified are a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015